Quantization Support #46

saileshd1402 · 2024-01-03T08:16:16Z

Changes:

Added bitsandbytes quantization of precision 4 and 8 bits support. (number of bits can be passed though '-q' argument in llm/run.sh)

gavrissh · 2024-01-03T09:58:11Z

llm/kubeflow_inference_run.py

@@ -365,6 +369,8 @@ def execute(params: argparse.Namespace) -> None:
 mount_path = params.mount_path
 model_timeout = params.model_timeout

+ quantize_bits = params.quantize_bits


white space on top

gavrissh · 2024-01-04T10:01:28Z

llm/handler.py

@@ -112,8 +112,29 @@ def initialize(self, context: ts.context.Context):
 self.tokenizer.padding_side = "left"
 logger.info("Tokenizer loaded successfully")

+ quantize_bits = self.get_env_value("NAI_QUANTIZATION")
+ quantize_bits = int(quantize_bits) if quantize_bits else quantize_bits


keep it simple here. you are doing the check, just to type cast

if self.get_env_value("NAI_QUANTIZATION"): quantize_bits = int(self.get_env_value("NAI_QUANTIZATION"))

changed as suggested

gavrissh · 2024-01-04T10:04:17Z

llm/kubeflow_inference_run.py

@@ -382,6 +387,15 @@ def execute(params: argparse.Namespace) -> None:
 model_info["repo_id"] = model_params["repo_id"]
 model_info["repo_version"] = check_if_valid_version(model_info, mount_path)

+ if quantize_bits and int(quantize_bits) not in [4, 8]:
+ print("## Quantization precision bits should be either 4 or 8")


There can be question, why it's not taking 16 as well. Add a text,

print("## Quantization precision bits should be either 4 or 8. Default precision used is 16")

changed the mentioned message

Quantization Initial Commit

1700b8a

gavrissh reviewed Jan 3, 2024

View reviewed changes

GPU check for Quantization

5a5bd2b

gavrissh approved these changes Jan 3, 2024

View reviewed changes

removed 16 bit precision option

85b6eda

gavrissh self-requested a review January 4, 2024 09:58

gavrissh reviewed Jan 4, 2024

View reviewed changes

saileshd1402 added 2 commits January 4, 2024 11:39

minor message change

a36e022

linting change

bcef8ed

gavrissh approved these changes Jan 4, 2024

View reviewed changes

saileshd1402 added 2 commits January 5, 2024 05:10

merge with upstream - resolve conflict

2c7d454

logic fix

094c071

gavrissh self-requested a review January 5, 2024 06:07

johnugeorge merged commit d905b6f into nutanix:main Jan 5, 2024
2 checks passed

gavrissh approved these changes Jan 5, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantization Support #46

Quantization Support #46

saileshd1402 commented Jan 3, 2024 •

edited

Loading

gavrissh Jan 3, 2024

saileshd1402 Jan 3, 2024

gavrissh Jan 4, 2024

saileshd1402 Jan 4, 2024

gavrissh Jan 4, 2024

saileshd1402 Jan 4, 2024

Quantization Support #46

Quantization Support #46

Conversation

saileshd1402 commented Jan 3, 2024 • edited Loading

gavrissh Jan 3, 2024

Choose a reason for hiding this comment

saileshd1402 Jan 3, 2024

Choose a reason for hiding this comment

gavrissh Jan 4, 2024

Choose a reason for hiding this comment

saileshd1402 Jan 4, 2024

Choose a reason for hiding this comment

gavrissh Jan 4, 2024

Choose a reason for hiding this comment

saileshd1402 Jan 4, 2024

Choose a reason for hiding this comment

saileshd1402 commented Jan 3, 2024 •

edited

Loading